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(54) Signal processing without muliplication in receivers for digital signals 



(57) Signal processing operations are performed in 
a digital communication system receiver on a sequence 
of received symbols, each representing a number of in- 
formation bits. The symbols correspond to points in a 
given modulation constellation generated by applying a 
predetermined rotation, e.g., a 45Qotation, to an other- 
wise conventional modulation constellation, e.g., a 
QPSK constellation, a 16-QAM constellation, etc. The 
use of the rotated constellation allows certain signal 
processing operations, such as filtering, Least-Mean- 



Squares (LMS) estimation, and Maximum-Likelihood 
(ML) sequence detection via the Viterbi algorithm, to be 
performed without the need for multipliers. By eliminat- 
ing or substantially reducing the number of required mul- 
tiplication operations, the invention significantly reduces 
the complexity and delay associated with the corre- 
sponding signal processing circuitry. Advantageously, 
this reduction in complexity and delay is accomplished 
without the use of any approximation or other reduction 
in precision. 
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Description 

Field Of The Invention 

5 [0001] The invention relates generally to digital communication systems and, more particularly, to signal processing 
operations, such as filtering, channel estimation, and channel equalization, for use in such systems. 

Background Of The Invention 

10 [0002] Channel estimation and equalization are important determinants of the quality of achieved data throughput 
in a digital communication system receiver. In conventional receivers, channel estimation is often performed using a 
technique known as Least-Mean-Squares (LMS) estimation, while channel equalization is performed using Maximum- 
Likelihood (ML) sequence detection via the Viterbi algorithm. A problem with these existing techniques is that, in their 
original form, they generally require many complex multiplication operations. More particularly, LMS estimation requires 

15 2M multiplications, and a full-search Viterbi algorithm requires PM multiplications, where M is the order of the channel 
estimator, and P is the size of the symbol alphabet. The large number of multiplications may render these conventional 
estimation and equalization techniques prohibitively expensive in terms of the required computational resources, par- 
ticularly at very high data rates. 

[0003] Subsequent implementations of the LMS estimation technique have attempted to reduce the required number 
20 of multiplications through the use of signed approximations of a regression vector and/or an error signal, as described 
in. e.g., T.A.C.M. Claasen and W.F.G. Mecklenbrauker, "Comparison of the convergence of two algorithms for adaptive 
FIR digital filters," IEEE Trans, on Acoustics. Speech and Signal Processing, Vol. ASSP-29, No. 3. pp. 670-678, June 
1981; D.L. Duttweiler, "A twelve-channel digital echo canceler," IEEE Trans, on Communications, Vol. COM-26, No. 
5, May 1978; and R.D. Gitlin, J.E. Mazo and M.G. Taylor, "On the design of gradient algorithms for digitally implemented 
25 adaptive filters," IEEE Trans, on Circuit Theory, Vol. 20, No. 2, pp. 125-136, Mar. 1973. These non-linear methods, 
however, can alter the training behavior such that the training speed is considerably reduced. As a result, these methods 
are typically suitable for use only in applications in which long training sequences are available, e.g.. broadcasting 
applications. 

[0004] It has also been proposed that pipelining be introduced in order to increase the throughput of the implemented 
30 hardware, as described in, e.g., M.D. Meyer and D.P. Agrawal, "A modular pipelined implementation of a delayed LMS 

transversal adaptive filter," Proc. of ISCAS, New Orleans, pp. 1943-1946, 1990. A pipelining technique, however, leads 

to delayed updates that also corrupt the learning behavior of the LMS algorithm. See, e.g., P. Kabal, "The stability of 

adaptive minimum mean square error equalizers using delayed adjustment," IEEE Trans, on Communications, Vol. 

COM-31, No. 3, pp. 430-432, Mar. 1983; G. Long, R Ling and J.A. Proakis, "The LMS algorithm with delayed coefficient 
35 adaptation," IEEE Trans, on Acoustics. Speech and Signal Processing, Vol. ASSP-37. No. 9, pp. 1397-1405, Sept. 

1989; and G. Long, F. Ling and J.A. Proakis, "Corrections to The LMS algorithm with delayed coefficient adaptation', 

" IEEE Trans, on Signal Processing, Vol. SP-40, No. 1, pp. 230-232, Jan. 1992. 

[0005] Although a number of techniques have been developed to compensate for the above-noted corruption, e.g., 
as described in M. Rupp and R. Frenzel, "The behavior of LMS and NLMS algorithms with delayed coefficient update 

40 in the presence of spherically invariant processes," IEEE Trans, on Signal Processing, Vol. SP-42, No. 3, pp. 668-672, 
March 1994; and F. Bjarnason, "Noise cancellation using a modified form of the filtered-XLMS algorithm," Proc. Eusipco 
Signal Processing V, Brussel, pp. 1053-1056, 1992, such techniques often require even more multiplications. Straight- 
forward realizations of delayed-update LMS with compensation are described in T. Kimijima, K. Nishikawa and H. Kiya, 
"A pipelined architecture for DLMS algorithm considering both hardware complexity and output latency," Proc. Eusipco, 

45 Patras, Greece, pp. 503-506, Sep. 1998. 

[0006] As is apparent from the above, a need exists for improved channel estimation, channel equalization and other 
signal processing techniques, which can eliminate or substantially reduce the required number of multiplications, with- 
out significantly altering training behavior, numerical precision or other desirable attributes of the corresponding algo- 
rithms. 

50 

Summary Of The Invention 

[0007] The present invention provides efficient computational techniques suitable for use in channel estimation, chan- 
nel equalization and other signal processing operations. In accordance with the invention, signal processing operations 
55 are performed in a digital communication system receiver on a sequence of received symbols, each representing a 
number of information bits. The symbols correspond to points in a given modulation constellation generated by applying 
a predetermined rotation, e.g., a 45° rotation, to an otherwise conventional modulation constellation, e.g., a QPSK 
constellation, a 16-QAM constellation, a 64-QAM constellation, a 256-QAM constellation, a 1024-QAM constellation, 
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etc. The use of the rotated constellation allows certain signal processing operations, such as Finite Impulse Response 
(FIR) filtering, Least-Mean-Squares (LMS) estimation, and Maximum-Likelihood (ML) sequence detection via the Vi- 
terbi algorithm, to be performed without the need for multipliers. By eliminating or substantially reducing the number 
of required multipliers, the invention significantly reduces the complexity and delay associated with the corresponding 

5 signal processing circuitry. 

[0008] In an illustrative embodiment of the invention, the signal processing operation utilizes a selector to implement 
a complex multiplication of a channel estimate coefficient with a symbol from a given modulation constellation. The 
selector receives as inputs real and imaginary parts of an element of the channel estimate coefficient, and generates 
as outputs real and imaginary parts of a product of the element of the channel estimate coefficient and a corresponding 

10 element of a given one of the symbols, without utilizing a multiplication operation. The selector may include, e.g. t first 
and second switches and first and second add/subtract units, the first and second switches each selecting one of the 
real or the imaginary part of the element of the channel estimate coefficient for application to a corresponding one of 
the add/subtract units, such that the add/subtract units compute elements of real and imaginary parts of an inner vector 
product. An FIR filter operation may be implemented using the selector by including feedback from outputs of the add/ 

is subtract units to corresponding inputs of the add/subtract units. The selectors can be arranged in multi-stage and/or 
hierarchical adder tree structures in order to implement the particular processing operations required in a given appli- 
cation. 

[0009] Advantageously, the invention can not only completely eliminate multiplications in certain embodiments, but 
also minimizes the number of addition operations required in implementing FIR filtering and other operations prevalent 
20 in channel estimation and equalization algorithms. The invention can be embodied in channel estimation algorithms 
such as LMS estimation and equalization algorithms such as ML sequence detection using the Vrterbi algorithm, and 
preserves the numerical precision of the algorithms while reducing their complexity dramatically. For example, an 
illustrative embodiment of the invention requires only selection operations, i.e., no adders or multipliers, to implement 
the multiplication of a complex-valued filter tap coefficient and a QPSK constellation point. 



25 



Brief Description Of The Drawings 



[0010] FIGS. 1(a) and 1(b) show exemplary conventional and rotated QPSK constellations, respectively. 
[0011] FIG. 2 is a table showing the mapping of two bits into a rotated QPSK constellation point in accordance with 
30 the invention. 

[0012] FIG. 3 shows a basic selector operation for implementing a complex multiplication of a coefficient with a 
symbol from a QPSK constellation, in accordance with the invention. 

[0013] FIGS. 4 and 5 illustrate other structures for implementing filter operations in accordance with the invention, 
utilizing the basic selector operation of FIG. 3. 
35 [0014] FIG. 6 shows a recursive add/sub operation for complete vector product computation in accordance with the 
invention. 

[001 5] FIGS. 7(a) through 7(f) illustrate the rotation of a 1 6-QAM constellation, and its separation into four subsets. 
[0016] FIG. 8 shows a two-step chain for multiplication in a 1 6-QAM set in accordance with the invention. 
[0017] FIG. 9 shows a three-step chain for multiplication in a 64-QAM set in accordance with the invention. 
40 [0018] FIG. 10 is a table showing the number of add/sub operations required to implement an FIR filter with M com- 
plex-valued taps, in accordance with the invention. 

[0019] FIG. 11 shows an FIR filter chain with 64-QAM blocks for multiplication in accordance with the invention. 
[0020] FIG. 12 is a table showing selection operations in accordance with the invention. 

[0021] FIG. 13 illustrates a basic add/sub structure for implementing an FIR filter, in accordance with the invention. 
45 [0022] FIG. 14 illustrates a first stage of an adder tree for implementing an FIR filter, in accordance with the invention. 
[0023] FIG. 15 is a table comparing, for different types of modulation, the number of required add/sub operations ' 
and the corresponding latency in an LMS technique in accordance with the invention. 
[0024] FIG. 16 illustrates an LMS update operation in accordance with the invention. 

[0025] FIG. 17 is a table comparing, for different types of modulation, the number of required add/sub operations 
50 per coefficient to initialize a trellis of the Vrterbi algorithm in accordance with the invention. 

[0026] FIGS. 18 and 19 show examples of communication system receivers in which the techniques of the invention 
may be implemented. 



Detailed Description Of The Invention 

55 ^ 

[0027] The present invention will be described herein with reference to particular types of signal processing devices, 
such as FIR filters, channel estimators, channel equalizers and Viterbi decoders. It should be understood, however, 
that the invention is more generally applicable to other types of signal processing devices and applications. 
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1. Basic Rotated Constellation Approach 

[0028] The invention will first be illustrated with an example based on QPSK modulation. Rather than using a con- 
ventional constellation, as shown in FIG. 1(a), with the constellation points, u, 6 {Vj2 (±1, ±/)}, the invention utilizes 

s a rotated constellation with points u, £ -1, -/}. The resulting rotated constellation is shown in FIG. 1(b). Rotating 
the constellation points in accordance with the invention does not change the behavior of the digital modulation scheme 
nor its transmission since any given hardware-dependent implementation as well as the transmission channel will 
typically add arbitrary rotations in any case. It has been noted in G.M. Durant and S. Ariyavisitakul, "Implementation 
of a broadband equalizer for high-speed wireless data applications," Proc. IEEE ICUPC 98, Florence, Italy, Oct. 1998, 

10 that using u t e (±1 ±j) rather than u t e (±1 ±j) saves complexity since multiplications can be performed as add/ 
sub operations. However, the rotated constellation of FIG. 1(b) simplifies this operation even further. To illustrate this, 
consider the implementation of an inner vector product, an operation commonly used in the above-noted LMS and 

Viterbi algorithms. Elements of a channel estimate, e.g., are combined in a row vector w = [w 1f w 2 w M ] t while 

transmitted modulated symbols are represented by a row vector u = , u 2 u M ] . The inner vector product to compute 

is is thus given by 
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M 



1=1 



with T denoting the transpose operation. In other words, a complex multiplication is required to multiply each transmitted 
symbol u i% where / = 1 . . . M, with one of the channel estimate weights w s . A complex multiplication usually requires 
25 four real multiplications and two add/subtract (add/sub) operations. That is, 

UfWf = Real(i/,)Real(Wy) - lmag(w / )lmag(w / ) +y{Real(u J )lmag(w l ) + Imagfi/^ReaKw,)}. 

30 Other realizations with three multiplications and more add/sub operations are possible: 

A = lmag(i/,) x {Real(w,.) + Imagfw,)} (1) 



S = Real(w,.) x {ReaKu,.) + Imagf^)} (2) 

C = Real(u ; .) x (lmag(w y ) - ReaKw,.)} (3) 

up. =B-A+y(S + C). (4) 

If, however, taking the structure of the rotated QPSK constellation, u t e {1j, -1,-y}, into account, the multiplications 
completely disappear, and only the following four cases remain: 

Uf = 1 : UjW. = ReaKw,) + yimag(w / ) (5) 

Uf -j : up. - - lmag(w.) + y'ReaKw,) (6) 

Uj = -1 : u f w. = -ReaKW;) - yimagjwy) (7) 
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U; ' -j : = lmag( w.) - /Realty). (8) 

In other words, the multiplication becomes a selection operation. As will be described in greater detail below, even the 
5 add/sub operations are not necessary. 

[0029] FIG. 2 shows one possible mapping of pairs of bits and b 0 into the rotated QPSK constellation points u f e 
{1.7, -1, -;} of FIG. 1(a). Arbitrary mappings, such as Gray coding, can be implemented by first applying an appropriate 
conversion mapping, followed by the FIG. 2 mapping. 

[0030] FIG. 3 illustrates a basic selector structure for implementing the above-noted selection operation, i.e., the 
10 selection process for a QPSK modulation that leads to a complex multiplication according to the table of FIG. 2 and 
Equations (5) through (8) above. The basic selector structure includes inverters 10-1 and 10-2, and switches 12-1 and 
12-2, interconnected as shown. As is apparent from FIG. 3, the selection operation does not require any multiplication 
or addition. Instead, a two's complement logic is assumed as well as some basic logic to control the selectors based 
on the values of bits b 0 and b v In this selector structure, the inverters 10-1 and 10-2 are assumed to be active high. 
15 That is, inversion is performed if the control signal value is "1" and the input signal is passed through without inversion 
when the control signal value is "0." The switches 12-1 and 12-2 are assumed to be connected to the upper position 
when the control signal value is "I," and connected to the lower position when the control signal value is "0." 
[0031] FIGS. 4 and 5 show possible hardware realizations for an operation to compute an inner vector product of 
length M, utilizing the basic selector of FIG. 2. In this case, addition operations cannot be completely eliminated. Assume 
20 that the partial sum is computed up to position (k - 1 ): 
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*w a Zw. (9) 



Then, the next step to compute $ k is 

* = 2...M. (10) 

The FIG. 4 implementation includes the elements of the FIG. 3 selector as well as adders 14-1 and 14-2 for adding 
the real and imaginary parts of s k ^ to the real and imaginary parts, respectively, of u k w k . The FIG. 5 implementation 
eliminates the inverters 10-1 and 10-2 of the FIG. 3 selector by incorporating two's complement operations into add/ 
sub units 16-1 and 16-2. Thus, apart from the switches 12-1 and 12-2 and other supporting logic, only two add/sub 
units are required if the sum is to be computed recursively. The add/sub units 16-1 and 16-2 are assumed to perform 
the addition operation when the value of their corresponding control signal is "1". 

[0032] FIG. 6 shows a fully recursive structure that can perform a complete FIR operation in accordance with Equation 
(9) without requiring any additional hardware. The FIG. 6 structure corresponds to the FIG. 5 structure with the addition 
of feedback to supply the real and imaginary parts of s k .^ to the corresponding inputs of the add/sub units 16-1 and 
16-2, respectively. 

1.1 Extension to QAM Constellations 

[0033] The above-described techniques can be applied to larger signal constellations, as will now be described with 
reference to an example based on a 16-QAM constellation. FIG. 7(a) shows the 16-QAM constellation, with a set of 
16 constellation points, u t e {±1 ±j, ±1 ±3y, ±3 ± y, ±3 ± 3/}. In accordance with the invention, the FIG. 7(a) constellation 
is rotated by 45°, resulting in the rotated constellation as shown in FIG. 7(b). FIGS. 7(c) through 7(0 show four different 
subsections of the rotated constellation of FIG. 7(b). 

[0034] Each subsection in this example corresponds to a translated QPSK constellation. Note also that the center 
point of each subsection corresponds to a point in a rotated QPSK constellation. Thus, a first step involves the selection 
of the center point of a subsection, which is the same as selecting a point in a rotated QPSK constellation as described 
previously. In a second step, the actual signal is selected. This is another selection process, similar to the previous 
selection. Note, however, that in the first step the value corresponding to the center point of each subsection is twice 
as large as the correction signal that is to be added. Thus, before the second step, a shift operation is required. In 
other words, in order to implement a multiplication with a symbol from a 16-QAM constellation, the corresponding 
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channel weight is now required to be selected as described in Equations (5) through (8) above, followed by a shift 
operation, and finally added by another selected value, as illustrated in FIG. 8. 

[0035] The FIG. 8 structure includes a selector (SEL) 30, a left-shift (L-SH) element 32, and a recursive selector 
(REC) 34. The selector 30 in FIG. 8 corresponds to the selector shown in FIG. 3, and the recursive selector 34 corre- 
5 sponds to the recursive structure of FIGS. 4 or 5. The left-shift element 30 implements the above-described left shift 
operation. 

[0036] An example of the processing implemented by the FIG. 8 structure is as follows. Consider the constellation 
point (1 + 3/) in the conventional 16-QAM constellation, which is now mapped into (2 + j) in the manner previously 
described. Note that the definition of one unit in actual implementation is arbitrary. In the first step, a multiply by 2 
10 operation is performed using a selection operation implemented by selector 30, followed by a shift operation imple- 
mented by element 32. After that, the coefficient is multiplied by;, which is another selection operation. Finally, the two 
values obtained from the two steps are added together. This complex multiplication is implemented using two real add 
operations in the recursive selector 34. 

[0037] FIG. 9 shows the recursive structure for multiplying with a 64-QAM constellation point. Compared to the FIG. 
15 8 recursive structure for the 16-QAM constellation, the structure for the 64-QAM constellation requires a three stage 
recursion, and thus an additional left shift element 36 and recursive selector 38. The structures for larger constellations 
can be generated in a similar manner, as will be apparent to those skilled in the art. 

1.2 Minimal Operations 

20 

[0038] The techniques of the invention as described thus far allow not only considerable reductions in computational 
complexity, but are also well suited for pipelined implementations. In some applications, to be described in detail below, 
it is of importance to compute an entire set of possible outcomes when multiplying a symbol from a limited alphabet 
size with a complex value w f . In this case, the complexity can be reduced further since many intermediate results can 
25 be re-used. 

[0039] Taking the first quadrant for the 1 6- QAM constellation of FIG. 7 as an example, the four possible coefficients 
are 

30 A+ye = w R +yw /f (11) 



C+jD = 3w R + 3jw r (12) 

35 

£ + JF = 2w R - w, + j(2w f + w R ) t (13) 



G + jH = 2w R + w, + j{2w, - w R ). (14) 

40 

The operation in the first line is free, the second line costs two adds and so does the third and fourth. Since all other 
values can be derived from multiplying several times by j, the values are obtained by flipping the real and imaginary 
values and inverting them. Since only eight different values (A-H) are involved, an additional eight inverters are required. 
45 Thus, the complete cost is 6 adders and 8 inverters, or equivalently, 14 add/sub operations. Similarly, if this method is 
applied to a 64-QAM constellation, 36 add/sub operations plus 32 inverters are required. 

2. Implementation Examples 

50 [0040] It will now be described how the above-described techniques can be used to implement FIR filters, LMS 
algorithms and ML sequence detection using the Vrterbi algorithm. 

2.1 FIR Filter 

55 [0041] FIG. 10 shows a table comparing the number of operations required in an FIR filter implemented using the 
techniques of Section 1.1 with that of a conventional FIR filter, for different types of modulation. A conventional FIR 
filter with M complex valued coefficients requires AM real multiplications and (AM - 2) real additions. The table of FIG. 
10 shows the number of real add/sub operations required when applying the techniques of the invention. It can be 
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seen from the table that, even for large constellations such as 1024-QAM, the techniques of the invention provide 
considerably lower complexity. 

[0042] FIG. 11 shows a possible implementation of an FIR filter chain using a concatenation of the 64-QAM blocks 
of FIG. 9 for multiplication. The concatenated 64-QAM blocks are denoted 40-1, 40-2. ... 40-M The outputs of the 
5 blocks 40-1 , 40-2, ... 40-M are combined in corresponding adders 42-1 , 42-2, ... 42-(M-1 ). Another possible approach 
is to combine all first stages of the multiplication operations first, then the second and finally the third stages. This 
approach can save mantissa length and thus chip area. Typical values of Mare around three to 64 while 1024-QAM 
or smaller constellations are typically used. 

[0043] Note however that the concept of implementing FIR filters of high order generally requires a long chain of 
10 adders. Many applications can apply pipelining to this structure in order to increase throughput at the expense of 
increased latency and increased chip area due to additional registers. Also note that the new selector techniques of 
the invention generally require only a few bits of information to be stored for each symbol, e.g., two bits for QPSK, so 
that a pipelining implementation of the technique with its additional registers typically does not require significantly 
more area. 

15 [0044] It should also be noted that some applications, such as the LMS algorithm, do not allow for latency. In this 
case, a hierarchical tree structure can be used to reduce the delay from (M - 1) T a to T g log 2 M, where T a is the delay 
time of one adder. For implementing such a tree structure, the basic selector structure of FIG. 3 may be preferable, 
although now 2 two's complement inverters will generally be required for each coefficient, i.e., 2M such inverters alto- 
gether. These two's complement inverters require not only cell area but also introduce a carry-ripple effect similar to 

20 that of standard adders, leading to an additional delay. Structures which avoid this additional delay will now be described 
with reference to FIGS. 12, 13 and 14. 

[0045] FIG. 13 shows a basic add/sub structure for a fast FIR filter implementation suited to add or subtract two 
numbers A and 8, i.e., to perform the operation Z=± A ± 8. For such an operation, either two add/sub structures or 
one add/sub with an additional inverter would generally be necessary. If, however, the sign signal is treated separately, 
25 just a pair of switches 50-1, 50-2 and one add/sub structure 52 is sufficient, as illustrated in FIG. 13. This results in a 
savings in area as well as latency. In order to realize the operation efficiently, the sign information, S(A) and S(B), for 
the two numbers, A and 8, has to be provided. The result of the operation is a signed number Z with new sign information 
S(Z) and the add/sub selection: 

30 

S{Z)=S(A)a S(8), (15) 



ADD/SUB = S(A)®$(B). (1 6) 

35 

The "wedge" operator a stands for the logical and function. FIG. 12 shows a table listing the four possible operations 
in terms of the sign operators and the outgoing signal Z. Furthermore, a multiplexer is required to select the correct 
input in case of a subtraction (A - B, B ~ A). The number Z is passed along together with its sign S(Z) to the following 
adder structure. The first stage only requires an additional selector for the coefficients. For adding two numbers A and 

40 B only the first operation A + 8 is required. 

[0046] Treating the sign separately as described in FIGS. 12 and 13 also provides a number of additional advantages. 
For example, a QPSK constellation, as welt as larger QAM constellations, can be implemented by assigning different 
values for the signs of A and 8. The modulation is thus already part of the adder-tree. FIG. 14 shows a first stage of 
an adder tree for a fast FIR filter implementation, illustrating this feature. The first stage of the tree includes switches 

45 60-1, 60-2, 60-3 and 60-4, and add/sub units 62-1 and 62-2, interconnected as shown. 

[0047] Another advantage of treating the sign separately is that the sign information can be passed along to the next 
add/sub stage in the adder tree and can thus be incorporated in a subsequent add operation without the need for 
additional inverters. At the end of the tree structure, one inverter might be necessary to provide final correction for the 
sign. It will be shown in the following section that the LMS algorithm can use this sign information immediately so that 

50 even this final inverter is not required. 

[0048] As described previously, the adder tree comprises add/sub structures as shown in FIG. 1 3, and the first stage 
can combine the selection of the coefficients and the add operations. This is shown in the FIG. 14 structure, which 
assumes that two complex coefficients A R + jA h and B R + jB f are QPSK modulated by the corresponding bits {a 0 , a^} 
and {6 0 , bj. The outgoing sign information of the output Z R + jZ f is given by 

55 

S(Z R )=(a Q Oa,)A(b Q Gb y ) (17) 
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S(Z / )=a 1 A/) l (18) 

Subsequent adder stages are constructed in a straightforward manner in accordance with Equations 15 and 16. 
2.2 LMS Algorithm 

[0049] The LMS algorithm is known to be of 2M complexity, and generally requires SM real multiplications for a 
complex input Its complexity is defined by two steps: 

e(0 = d(/>u,V\ (19) 

the error equation with 

if i = [i«(J),if(M) u(/-M+1)] (20) 

and the coefficient update equation: 

= w,. + ue(/) u,* , (21) 

with 

% = [%(0),w,(1) w,(M-1)] (22) 

denoting w,{f) the tap weight at time instant / with index / ranging from 0 to M - 1 . If the technique as explained in the 
previous section is applied, the multiplications for the error e(i) computation can be substituted by select and add/sub 
operations. To compute the coefficient update (Equation 21), if the step-size \l is a power of two (u = 2**), the multipli- 
cation ue(/) can be replaced by a simple shift operator. The remaining multiplication with the symbols u, can also be 
achieved with the new technique. All that remains is to update the coefficient, which is a complex addition. 
[0050] FIG. 15 is a table showing the complexity and minimum latency of the LMS algorithm, as implemented using 
the techniques of the invention, for different modulation techniques. A QPSK constellation requires 2M add/sub oper- 
ations to evaluate the error equation as well as updating the coefficient. For 16-QAM, the error computation requires 
AM add/sub operations, the multiplication e(i)u. requires 2M add/sub operations, and the coefficient update requires 
2M add/sub operations, i.e., SM add/sub operations are required altogether. In summary, 4M, SM and 12M add/sub 
operations are required for QPSK, 16-QAM and 64-QAM constellations, respectively. Thus when the constellation size 
is increased by four times, another AM additional operations are required. 

[0051] As mentioned in Section 2.1 above, faster realizations are sometimes required. For the LMS algorithm, the 
update cannot be performed until the error signal is available. Thus, a very rapid FIR filter chain is important. This can 
be implemented using the hierarchical tree structure described in Section 2.1. FIG. 16 shows an exemplary implemen- 
tation of this type, comprising add/sub units 70-1 and 70-2, selectors 72-1 through 72-4, and an add/sub hierarchical 
tree structure 74. The computation of the error signal in this case requires an additional subtraction: e(0 = d(t)-UjwT. If 
the order of the filter is chosen to be M = 2 L -1 , d(i) can be treated as one filter tap-weight element and does not require 
additional delay. 

[0052] As previously mentioned, depending on the sign of the last signal (in this case the error signal), a final two's 
complement inverter may be required. This would cost an additional delay since two's complement inverters cause 
carry-ripple effects. However, in the LMS algorithm, the last inverter can be incorporated into the selection process of 
the coefficient update, i.e., if the negative information is active, -u.e(i) is applied rather than ue(/). 
[0053] The modified selection is faster because it is realized by simple logical gates and does not require an adder 
structure. For QPSK, one complete update cycle requires T a log 2 M for the FIR error part, possibly one T a for the error 
to multiply by the step-size u, and finally one add operation for the updates of the coefficients. Thus, the minimum 
update time is given by (log 2 M ♦ 2)T a . If the step-size \i is a negative power of two, the multiplication by the error is a 
simple scaling operation and can therefore be realized without any additional time delay. Other values of the step-size 
can be approximated with a sum of two such values, i.e., u = 2* ± 2*k so that one add/sub operation is sufficient for 
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the multiplication with the error. This operation gives a wide range for possible step-size values. The possible additional 
operation is indicated with a dashed box 75 in FIG. 16. 

[0054] As an example, assume that the LMS algorithm is applied to train a channel estimator of order M = 15, and 
the technology used implements an add/sub operation in 1 ns and a multiplication operation in 6 ns. Thus, the whole 
5 update for a BPSK/QPSK training sequence can be performed in 6ns with add/sub operations while it takes about 12 
ns when using multipliers. The required chip area and power consumption, on the other hand, might become ten times 
higher to implement the multipliers. Real-time processing is thus possible for up to 1/6 ns = 166 Msymbols-per-second 
in this case. 

10 2.3 ML Sequence Detection 

[0055] In a digital communication system that transmits information over a channel causing Inter-Symbol-Interference 
(ISI), the optimum detector is a maximum-likelihood symbol sequence detector (MLSD), as described in, e.g., J.G. 
Proakis, "Channel equalization," The Communications Handbook, CRC Press, 1997, Chapter 26; and G.D. Forney, 
15 Jr., "Maximum-likelihood sequence detection in the presence of intersymbol interference," IEEE Trans, on Information 
Theory, Vol. IT-18, pp. 363-378. May 1972. 

[0056] An efficient algorithm for implementing a MLSD is the Viterbi algorithm, which was originally devised for de- 
coding convolutional codes. See the above-cited J.G. Proakis reference and G.D. Forney, Jr., "The Viterbi algorithm, 
" IEEE Proceedings, Vol. 61, pp. 268-278, March 1973. In this case, the ISI channel is modeled as a Finite-State 
20 Machine (FSM), called a trellis, with P^" 1 states. Here P is the information symbol alphabet size and M is the number 
of complex-valued channel FIR filter coefficients, w f . In the trellis, there are P transitions diverging from each state, 
corresponding to the P different values of the information symbol, u(k). The values associated with the transitions 
between the states are w f u(k), i.e., the possible received values, given the estimated channel coefficients w f . 
[0057] Assuming that the received sequence is r{k), the estimated channel coefficients are w h and the input infor- 
ms mation symbols are u(k), the Viterbi algorithm finds the most-likely transmitted symbol u(k) by recursively finding the 
path in the trellis that is closest in Euclidean distance to the received noisy sequence r{k). That is, it implements the 
ML detector criterion by recursively minimizing with respect to u(k), 



35 At each recursion step, the Viterbi algorithm searches over the P M possible transitions. If the channel model coefficient 
changes from one recursion to the other, for every one of the M estimated channel coefficients w h P multiplications 
with the P symbols of the symbol alphabet are required. Thus, a total of PM = O(PM) complex multiplications are 
required to compute the various terms initially. In order to compute the cost metrics of the P M transitions between the 
states in the FSM, another (M - 1) x P M = 0(MP M ) complex additions are required, if this is done using a conventional 

40 method, a complex multiplication is performed with four to eight add/sub operations depending on the symbols. The 
value (3 + 7j), for example, requires one addition for the multiplication by three (3=1+2) and one subtraction for seven 
(7=8-1). This needs to be performed on the real as well as on the imaginary part of the coefficient, resulting in O(PM) 
= 4PM. Finally, the real and imaginary part needs to be added to r(k), which requires another 2 add operations. For ail 
M coefficients and P M states, there are thus a total of 0(MP M ) = 2MP M add operations. 

45 [0058] Using the techniques of the invention as described in Section 1.1 above, complexity can be reduced consid- 
erably. For QPSK, multiplication of each coefficient w t with u(k) becomes a selection process, and Of/tfP^) = 0. Only 
the computation of the transitions remains with 0(MP M ) = 2MP M to compute 



in Equation 16. For 16- QAM, the first step is a selection process, the next is two add/subs to compute each multiplication 
per symbol, thus 0{PM) = 2PM= 32M. For 64-QAM, in each subset of four symbols 10 add/subs are required for 
55 computing the multiplication, thus 0(PM) = 160M. The application of this technique has the additional advantage that 
it can readily be pipelined without adding too much chip area for the additional registers. 

[0059] At the expense of increasing pipelining complexity, the minimal operations described in Section 1 .2 can reduce 
the computation complexity even further. For example, for 16-QAM modulation, 0(PM) = 14M and for 64-QAM, O(PM) 
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= 68M. 

[0060] Initializing the trellis at the beginning of an equalization process requires computing all possible multiplications 
with all the elements in the symbol alphabet once. This assumes that the channel remains constant over a frame of 
data. If the channel is rapidly changing from symbol to symbol, this initialization has to be performed at every recursion 

5 to update the transition values of the trellis. The obtained values can be stored in a look-up table for computing the 
Euclidean norm in the Viterbi algorithm. FIG. 17 shows a table comparing the complexity for initialization (0{PM)) of 
the Section 1 .1 and Section 1 .2 implementations with a conventional implementation, for different types of modulation. 
[0061] In order to obtain the complete complexity of the Viterbi algorithm, 0(Mf**) needs to be added if a full search 
through the trellis is applied. This part can easily exceed the initial complexity of 0(PM). Reduced complexity techniques 

10 may be applied, such as reduced-state sequence estimation techniques that limit the search through the trellis, as 
described in, e.g., M. Eyuboglu and S. Qureshi, "Reduced-state sequence estimation for coded modulation on inter- 
symbol interference channels," IEEE Journal on Selected Areas in Communications, Vol. 7, pp. 989-995, August 1 989. 
In this case, the initial complexity can become very significant. Using a tree structure in accordance with the invention 
for implementing the add operations can further reduce the complexity as described in Section 1.1 above. 

15 

3. Receiver Examples 

[0062] FIG. 18 shows an illustrative embodiment of a receiver 100 in which the above-described signal processing 
operations may be implemented. The receiver 100 receives a sequence of symbols transmitted by a transmitter 102. 

20 The symbols are generated by transmitter 102 in accordance with a rotated constellation as previously described. The 
receiver 100 includes a ML sequence detector (MLSD) 104 which implements the Viterbi algorithm, and is connected 
in parallel with an LMS estimator 106. One or both of the MLSD 104 and LMS estimator 106 are implemented using 
the computational techniques described above. The channel estimates from the LMS estimator 106 are fed to the 
MLSD as shown. An optional channel decoder 108 is also included in the receiver 100. 

25 [0063] FIG. 19 shows another example receiver 120 in which the invention may be implemented. The receiver 120, 
which implements a decision feedback equalization (DFE) technique, and includes the MLSD 104, LMS estimator 106 
and optional channel decoder 108, and also includes a set of feed-forward filters 122 and a set of feedback filters 124. 
The output of the LMS estimator in this case is used to determine adaptive filter taps for the set of feedback filters 1 24. 
The received symbols are processed through feed-forward filters 122, and then applied to the MLSD 104. The output 

30 of the MLSD is fedback through the feedback filters 124. 

[0064] It should be noted that the arrangements of FIGS. 18 and 19 are examples only, and other embodiments may 
include different arrangements of elements and/or additional elements not explicitly shown. 

4. Conclusion 

35 

[0065] The invention provides efficient computational methods and apparatus that allow large reductions in com- 
plexity for implementations of communication algorithms that require multiplying a coefficient with a constellation point 
in a PSK or QAM constellation. Chip area as well as latency can be reduced due to the substitution of multiplications 
by simpler functions. Note that applying this technique does not cause any approximation or reduction in precision, 
but simply removes area-intensive and time-intensive operations. 

[0066] It should be noted that the above-described illustrative embodiments may be implemented in hardware, soft- 
ware or combinations of hardware and software. For example, the computational structures illustrated in FIGS. 3-6, 8, 
9, 11, 13, 14 and 16 may be implemented as elements of an application-specific integrated circuit (ASIC) or other digital 
data processing device for use in a channel estimator, channel equalizer, demodulator, decoder or other element of a 
45 digital communication system receiver. 

[0067] Although the illustrative embodiments of the present invention have been described herein with reference to 
the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and 
that various other changes and modifications may be affected therein by one skilled in the art without departing from 
the scope of the invention as set forth in the appended claims. 

50 

Claims 

1. A method of processing information in a receiver of a digital communication system, the method comprising the 
55 step of: 

* applying a signal processing operation to a sequence of transmitted symbols, wherein the transmitted sym- 
bols correspond to points in a first modulation constellation corresponding to a rotated version of a second mod- 
ulation constellation, and each of the transmitted symbols represents a particular number of information bits, and 
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further wherein use of the first modulation constellation allows the signal processing operation to be performed 
using a reduced number of operations relative to the number of operations required in conjunction with the second 
modulation constellation. 

5 2. The method of claim 1 wherein use of the first modulation constellation allows the signal processing operation to 
be performed without multiplication operations. 

3. The method of claim 1 wherein the first modulation constellation is generated by applying a 45° rotation to the 
second modulation constellation. 

10 

4. The method of claim 1 wherein the second modulation constellation comprises one of a PSK constellation and a 
QAM constellation. 

5. The method of claim 1 wherein the signal processing operation comprises at least one of a finite impulse response 
15 (FIR) filtering operation, a Least-Mean-Squares (LMS) estimation operation, and a Maximum-Likelihood (ML) se- 
quence detection operation using a Vrterbi algorithm. 

6. The method of claim 1 wherein the signal processing operation utilizes a selector to implement a complex multi- 
plication of a channel estimate coefficient with a symbol from the first modulation constellation. 

20 

7. The method of claim 6 wherein the selector receives as inputs real and imaginary parts of an element of the channel 
estimate coefficient, and generates as outputs real and imaginary parts of a product of the element of the channel 
estimate coefficient with a corresponding element of a given one of the symbols, without utilizing a multiplication 
operation. 

25 

8. The method of claim 7 wherein the selector comprises first and second switches and first and second add/subtract 
unit, the first and second switches each selecting one of the real or the imaginary part of the element of the channel 
estimate coefficient for application to a corresponding one of the add/subtract units, such that the add/subtract 
units compute elements of real and imaginary parts of an inner vector product. 

30 

9. The method of claim 8 wherein an FIR filter operation is implemented using the selector by including feedback 
from outputs of the add/subtract units to corresponding inputs of the add/subtract units. 

10. The method of claim 1 wherein the signal processing operation comprises a multi-stage multiplication operation 
35 implemented without multiplication operations, wherein each stage of the multi-stage operation corresponds to a 

selector, and a left shift element is arranged between an output of a given one of the stages and a corresponding 
input of a subsequent stage. 

11. The method of claim 1 wherein the signal processing operation is implemented utilizing a multi-stage hierarchical 
40 adder tree without multiplication operations. 

12. An apparatus for use in processing information in a receiver of a digital communication system, the apparatus 
comprising: 

a signal processing circuit for processing a sequence of transmitted symbols, wherein the transmitted sym- 
45 bols correspond to points in a first modulation constellation corresponding to a rotated version of a second mod- 

ulation constellation, and each of the transmitted symbols represents a particular number of information bits, and 
further wherein use of the first modulation constellation allows the signal processing operation to be performed 
using a reduced number of operations relative to the number of operations required in conjunction with the second 
modulation constellation. 

so 

1 3. The apparatus of claim 1 2 wherein use of the first modulation constellation allows the signal processing operation 
to be performed without multiplication operations. 

14. The apparatus of claim 12 wherein the first modulation constellation is generated by applying a 45° rotation to the 
55 second modulation constellation. 

15. The apparatus of claim 12 wherein the other modulation constellation comprises one of a PSK constellation and 
a QAM constellation. 
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16. The apparatus of claim 1 2 wherein the signal processing circuit comprises at least one of a finite impulse response 
(FIR) filter, a Least-Mean-Squares (LMS) estimator, and a Maximum-Likelihood (ML) sequence detector imple- 
mented using a Viterbi algorithm. 

5 17. The apparatus of claim 12 wherein the signal processing circuit comprises at least one selector operative to im- 
plement a complex multiplication of a channel estimate coefficient with a symbol from the first modulation constel- 
lation. 

18. The apparatus of claim 17 wherein the selector receives as inputs real and imaginary parts of an element of the 
10 channel estimate coefficient, and generates as outputs real and imaginary parts of a product of the element of the 

channel estimate coefficient with a corresponding element of a given one of the symbols, without utilizing a mul- 
tiplication operation. 

19. The apparatus of claim 18 wherein the selector comprises first and second switches and first and second add/ 
15 subtract unit, the first and second switches each selecting one of the real or the imaginary part of the element of 

the channel estimate coefficient for application to a corresponding one of the add/subtract units, such that the add/ 
subtract units compute elements of real and imaginary parts of an inner vector product. 

20. The apparatus of claim 19 wherein the signal processing circuit comprises an FIR filter implemented using the 
20 selector configured with feedback from outputs of the add/subtract units to corresponding inputs of the add/subtract 

units. 

21 . The apparatus of claim 12 wherein the signal processing circuit comprises a multi-stage circuit implemented without 
multiplication operations, wherein each stage of the multi-stage circuit corresponds to a selector, and a left shift 

25 element is arranged between an output of a given one of the stages and a corresponding input of a subsequent 

stage. 

22. The apparatus of claim 12 wherein the signal processing circuit is implemented utilizing a multi-stage hierarchical 
adder tree without multiplication operations. 

30 

23. An apparatus for use in processing information in a receiver of a digital communication system, the apparatus 
comprising: 

signal processing means for applying a signal processing operation to a sequence of transmitted symbols, 
wherein the transmitted symbols correspond to points in a first modulation constellation corresponding to a rotated 
35 version of a second modulation constellation, and each of the transmitted symbols represents a particular number 

of information bits, and further wherein use of the first modulation constellation allows the signal processing op- 
eration to be performed using a reduced number of operations relative to the number of operations required in 
conjunction with the second modulation constellation. 

40 24. A method of processing information in a transmitter of a digital communication system, the method comprising the 
step of: 

generating a sequence of transmitted symbols, wherein the transmitted symbols correspond to points in a 
first modulation constellation generated by applying a predetermined rotation to a second modulation constellation, 
and each of the transmitted symbols represents a particular number of information bits, and further wherein use 
45 of the first modulation constellation allows a signal processing operation in a corresponding receiver of the system 

to be performed using a reduced number of operations relative to the number of operations required in conjunction 
with the second modulation constellation. 

25. An apparatus for use in processing information in a transmitter of a digital communication system, the apparatus 
50 comprising: 

means for generating a sequence of transmitted symbols, wherein the transmitted symbols correspond to 
points in a first modulation constellation representative of a rotated version of a second modulation constellation, 
and each of the transmitted symbols represents a particular number of information bits, and further wherein use 
of the first modulation constellation allows a signal processing operation in a corresponding receiver of the system 
55 to be performed using a reduced number of operations relative to the number of operations required in conjunction 

with the second modulation constellation. 
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